Skip to content

feat(scraper-create): write {collector_id, ...} envelope to -o#6

Open
anil-bd wants to merge 1 commit into
brightdata:mainfrom
anil-bd:feat/scraper-create-envelope
Open

feat(scraper-create): write {collector_id, ...} envelope to -o#6
anil-bd wants to merge 1 commit into
brightdata:mainfrom
anil-bd:feat/scraper-create-envelope

Conversation

@anil-bd
Copy link
Copy Markdown

@anil-bd anil-bd commented May 18, 2026

Today the file -o create.json writes only the final AI-progress payload — no collector_id, no name, no view_url. The documented recipe in references/recipes.md depends on jq reading the collector_id out of that file:

COLLECTOR_ID=$(jq -r '.collector_id // .id' create.json)
bdata scraper run "$COLLECTOR_ID" ...

Today that returns the string "null" because the field doesn't exist in the file. Every script that follows the docs to chain create → run is silently broken.

This change wraps every termination path (success, AI-trigger failure, status=failed, polling exception) in one envelope:

{
  "collector_id":    "c_...",
  "name":            "audit-r4-...",
  "status":          "done" | "failed" | "ai_trigger_failed" | "poll_failed",
  "completed_steps": [...],
  "view_url":        "https://brightdata.com/cp/scrapers/c_...",
  "created_at":      "2026-05-18T07:28:30Z",
  "error":           "..."   // failure paths only
}

Notable design choices:

  • Every termination path writes the same shape, including failure paths that previously wrote nothing. So a script using jq -r '.collector_id' always recovers an id when one exists — even from a stub collector that hit the AI-Flow parallel-job cap. This makes good on SKILL.md's promise that every failure path surfaces the collector_id.
  • view_url is included on every envelope so the user has a one- click recovery path to inspect / finish / delete the scraper in the dashboard, without needing to know the URL pattern.
  • created_at is taken from the template-creation response when the API provides it (Create_template_response.created), omitted otherwise — never invented.
  • New --legacy-output flag preserves today's bare-progress shape for one minor version so any existing scripts that depended on the old shape have a migration window. Slated for removal in the next major.
  • Stdout (the success summary printed to TTY) is unchanged. Only the machine-readable -o / --json / --pretty payload is reshaped.
  • Scoped strictly to src/commands/scraper.ts and the new envelope type. The shared HTTP client and other commands (scrape, search, discover, pipelines, browser) are untouched.

Tests: 4 new build_create_envelope unit cases covering success, omitted-created_at, failure-with-error, and view_url-on-every- path. 5 new handle_create_scraper integration cases covering success envelope, the documented jq recipe, --legacy-output preserving the bare shape, AI-trigger failure envelope (the stub-collector recovery path), poll-status-failed envelope, and poll-exception envelope. Two existing tests updated from strict opts-object matches to objectContaining-style (the contract is now the envelope shape, not the bare payload).

55 / 55 scraper tests pass. The 9 pre-existing failures in unrelated suites (daemon, add-mcp, browser, discover, scrape) on main are unchanged by this PR.

Spec: brightdata/skills repo, proposal at
skills/scraper-studio/proposals/PR-2-create-envelope.md (to be filed alongside this PR).

Today the file `-o create.json` writes only the final AI-progress
payload — no `collector_id`, no name, no view_url. The documented
recipe in references/recipes.md depends on jq reading the
collector_id out of that file:

    COLLECTOR_ID=$(jq -r '.collector_id // .id' create.json)
    bdata scraper run "$COLLECTOR_ID" ...

Today that returns the string "null" because the field doesn't exist
in the file. Every script that follows the docs to chain create →
run is silently broken.

This change wraps every termination path (success, AI-trigger
failure, status=failed, polling exception) in one envelope:

    {
      "collector_id":    "c_...",
      "name":            "audit-r4-...",
      "status":          "done" | "failed" | "ai_trigger_failed" | "poll_failed",
      "completed_steps": [...],
      "view_url":        "https://brightdata.com/cp/scrapers/c_...",
      "created_at":      "2026-05-18T07:28:30Z",
      "error":           "..."   // failure paths only
    }

Notable design choices:

* Every termination path writes the same shape, including failure
  paths that previously wrote nothing. So a script using
  `jq -r '.collector_id'` always recovers an id when one exists —
  even from a stub collector that hit the AI-Flow parallel-job cap.
  This makes good on SKILL.md's promise that every failure path
  surfaces the collector_id.
* `view_url` is included on every envelope so the user has a one-
  click recovery path to inspect / finish / delete the scraper in
  the dashboard, without needing to know the URL pattern.
* `created_at` is taken from the template-creation response when
  the API provides it (`Create_template_response.created`),
  omitted otherwise — never invented.
* New `--legacy-output` flag preserves today's bare-progress shape
  for one minor version so any existing scripts that depended on
  the old shape have a migration window. Slated for removal in
  the next major.
* Stdout (the success summary printed to TTY) is unchanged. Only
  the machine-readable `-o` / `--json` / `--pretty` payload is
  reshaped.
* Scoped strictly to `src/commands/scraper.ts` and the new
  envelope type. The shared HTTP client and other commands
  (scrape, search, discover, pipelines, browser) are untouched.

Tests: 4 new `build_create_envelope` unit cases covering success,
omitted-created_at, failure-with-error, and view_url-on-every-
path. 5 new `handle_create_scraper` integration cases covering
success envelope, the documented jq recipe, --legacy-output
preserving the bare shape, AI-trigger failure envelope (the
stub-collector recovery path), poll-status-failed envelope, and
poll-exception envelope. Two existing tests updated from strict
opts-object matches to objectContaining-style (the contract is
now the envelope shape, not the bare payload).

55 / 55 scraper tests pass. The 9 pre-existing failures in
unrelated suites (daemon, add-mcp, browser, discover, scrape) on
main are unchanged by this PR.

Spec: brightdata/skills repo, proposal at
skills/scraper-studio/proposals/PR-2-create-envelope.md (to be
filed alongside this PR).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant